Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support #1478

Merged
merged 4 commits into from
Sep 28, 2023

Conversation

hbredin
Copy link
Member

@hbredin hbredin commented Sep 27, 2023

Should fix #1475 #1476

I would love feedback from @doublex @guilhermehge @realfolkcode

@hbredin hbredin changed the title fix: fix WeSpeakerPretrainedSpeakerEmbedding.to("cuda") fix: fix WeSpeakerPretrainedSpeakerEmbedding GPU support Sep 27, 2023
Copy link

@realfolkcode realfolkcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the Colab MRE from #1475 to install this fix. Here is the link. However, it made the wall time even worse: 6m 45s. It allocates memory into VRAM but for some reason the embedding model is extremely slow.

onnxruntime-gpu version: 1.16.0
CUDA version: 11.8

@hbredin
Copy link
Member Author

hbredin commented Sep 27, 2023

Thanks a lot. That really helped me narrow things down.

I think the issue is that default onnxruntime behavior is to optimize the computation graph for each new input shape... and it happens that pyannote speaker diarization pipeline might use a lot of different shapes when processing a file.

microsoft/onnxruntime#6978

I just pushed a new commit. Can you try again?

@guilhermehge
Copy link

guilhermehge commented Sep 27, 2023

I was testing the solution in an isolated environment using the docker image nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04.

First I tested only adding onnxruntime-gpu==1.16.0 to my requirements along with pyannote.audio==3.0.0, but the time didn't change and the GPU was not used.

Second, I tried using only this commit and it still got the same time and no gpu being used.

What I want to point out here is that GPU IS NOT BEING USED even though onnxruntime-gpu is installed. Is it possible that we need to allocate the pipeline to the GPU in a different manner? Using the onnx library for instance?

Since you've pushed another commit, I'll build the image again and I'll comeback here with the results.

@guilhermehge
Copy link

@hbredin still not working with the new commit, I still get the same embedding time and the GPU is not being used. Here's a snippet of nvidia-smi while the embedding was at 40%

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000001:00:00.0 Off |                  Off |
| N/A   37C    P0    25W /  70W |   4863MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

I remind you that this is in an isolated environment using a docker container. The GPU works fine for the older diarization pipeline @2.1 and for the faster_whisper algorithm, but for the embedding model of the new pipeline, it does not work.

@hbredin
Copy link
Member Author

hbredin commented Sep 27, 2023

That's weird because it solves the issue on Google Colab.
pyannote/speaker-diarization-3.0 is even slightly faster than pyannote/speaker-diarization-2.1.

@hbredin
Copy link
Member Author

hbredin commented Sep 27, 2023

I have no knowledge of docker containers.

Could it be something related to an incompatibility between onnxruntime-gpu and docker/cuda images?

Are you 100% sure that it used the latest commit and no cache was used?

@guilhermehge
Copy link

guilhermehge commented Sep 27, 2023

Yes, I am sure, I rebuilt the image from scratch and checked if your commit was in fact in the code. I'll go check the colab with your solution. As you mentioned, the problem might be a dependency problem with the specific image that I'm using in docker. I'll check it out and let you know.

Just FYI, docker containers are isolated environment that only run what we need for the application that we're using. It should work for all cases, not only for colab.

Edit: Indeed it worked in my MRE colab. I'll check it out in my docker container to see if I can make it work.

@guilhermehge
Copy link

guilhermehge commented Sep 27, 2023

Update: I did a pip install --force-reinstall onnxruntime-gpu and it worked on the docker container, but when loading the pipeline, I got the following warning:

2023-09-27 15:15:43.525160455 [W:onnxruntime:, session_state.cc:1162 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-09-27 15:15:43.525193659 [W:onnxruntime:, session_state.cc:1164 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Do you know what it might be?

I believe I know what the problem is, I'm also installing faster_whisper in this environment, and faster_whisper's requirements are:

av==10.*
ctranslate2>=3.17,<4
huggingface_hub>=0.13
tokenizers>=0.13,<0.15
onnxruntime>=1.14,<2

So, it installs the onnxruntime and your library is installing onnxruntime-gpu. I'll see if I can sort this out.

I believe you may complete this pull request, this is a problem at my end and your code is working.

Question: Will you publish this alterations on the pypi package? Like a 3.0.1 version?

@hbredin
Copy link
Member Author

hbredin commented Sep 27, 2023

Thanks. Will make a few more tests on my side and will then merge.

Question: Will you publish this alterations on the pypi package? Like a 3.0.1 version?

Yes, it will be released as 3.0.1.

@guilhermehge
Copy link

guilhermehge commented Sep 27, 2023

Just as a sidenote, I believe your model will be used with faster_whisper, and using onnxruntime-gpu may make it incompatible with that library. I am going to run a few more tests and let you know my results, but, so far, faster_whisper stopped working when I uninstalled onnxruntime to leave only onnxruntime-gpu. Do you believe there is another alternative? Like porting your model outside of onnx?

I posted an issue on faster_whisper's repo to address the situation.

@hbredin
Copy link
Member Author

hbredin commented Sep 27, 2023

Do you believe there is another alternative? Like porting your model outside of onnx?

The point is that this is not my model. pyannote does not (yet) have a good speaker embedding model of its own. It uses external ones.

Working on it, though ;-)

@guilhermehge
Copy link

Oh, fair enough, but is it possible to convert it for not using onnx?

@hbredin
Copy link
Member Author

hbredin commented Sep 28, 2023

Oh, fair enough, but is it possible to convert it for not using onnx?

Issue #1477 has already been opened related to this particular aspect.
Let's continue this discussion there. But, short answer: I don't (yet) know how to do that.

@hbredin hbredin merged commit e478d57 into develop Sep 28, 2023
@hbredin hbredin deleted the fix/onnxruntime-gpu branch September 28, 2023 19:36
@hbredin
Copy link
Member Author

hbredin commented Sep 28, 2023

I just released 3.0.1, including this fix.

@guilhermehge
Copy link

Awesome! Just checked pypi, great job! fyi, I believe it's not still showing on github's releases yet.

fimad added a commit to fimad/whisperX that referenced this pull request Oct 13, 2023
pyannote 3.0.0 has a bug where the new embedding model does not run on the GPU. This is fixed in version 3.0.1 via pyannote/pyannote-audio#1478.
@hbredin
Copy link
Member Author

hbredin commented Nov 9, 2023

FYI: #1537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pipeline.to(torch.device("cuda")) not working on T4 Tesla GPU (pyannote==3.0.0)
3 participants